Binary Theta-Joins using MapReduce: Efficiency Analysis and Improvements
نویسندگان
چکیده
We deal with binary theta-joins in a MapReduce environment, and we make two contributions. First, we show that the best known algorithm to date for this problem can reach the optimal trade-o↵ between the size of the input a reducer can receive and the incurred communication cost when the join selectivity is high. Second, when the join selectivity is low, we present improvements upon the state-of-the-art with a view to decreasing the communication cost and the maximum load a reducer can receive, taking also into account the load imbalance across the reducers.
منابع مشابه
Optimizing Theta-Joins in a MapReduce Environment
Data analyzing and processing are important tasks in cloud computing. In this field, the MapReduce framework has become a more and more popular tool to analyze large-scale data over large clusters. Compared with the parallel relational database, it has the advantages of excellent scalability and good fault tolerance. However, the performance of join operation using MapReduce is not as good as t...
متن کاملGPU processing of theta-joins
The GPGPU paradigm has been recently employed to accelerate the processing of big amounts of data through the utilization of the massive parallelism offered by modern GPUs. To date, several techniques have been proposed for the implementation of simple select, aggregate and equality join operations on GPUs. In this paper, we study the efficient implementation of theta-join queries between two r...
متن کاملEfficient Processing Distributed Joins with Bloomfilter using MapReduce †
The MapReduce framework has been widely used to process and analyze largescale datasets over large clusters. As an essential problem, join operation among large clusters attracts more and more attention in recent years due to the utilization of MapReduce. Many strategies have been proposed to improve the efficiency of distributed join, among which bloomfilter is a successful one. However, the b...
متن کاملA Theoretical and Experimental Comparison of Filter-Based Equijoins in MapReduce
MapReduce has become an increasingly popular framework for large-scale data processing. However, complex operations such as joins are quite expensive and require sophisticated techniques. In this paper, we review state-of-the-art strategies for joining several relations in a MapReduce environment and study their extension with filter-based approaches. The general objective of filters is to elim...
متن کاملEfficient Multi-way Theta-Join Processing Using MapReduce
Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries cannot be easily extended to fit a shared-nothing distributed computing paradigm, which is proven to be able to support OLAP applications over immense data volum...
متن کامل